Skip to content

feat: refactor cleanup cronjob to use pagination and reduce memory usage#411

Open
davidmogar wants to merge 1 commit into
mainfrom
oomkill
Open

feat: refactor cleanup cronjob to use pagination and reduce memory usage#411
davidmogar wants to merge 1 commit into
mainfrom
oomkill

Conversation

@davidmogar

Copy link
Copy Markdown
Member

Replace AWK-based processing with kubectl pagination API to handle large resource sets more efficiently. Process resources in batches of 100 using continuation tokens, and stream through jq to minimize memory footprint. Increase memory limit from 256Mi to 512Mi to accommodate batch processing. Add deletion and error counters for better observability.

Generated-By: Claude Sonnet 4.5 noreply@anthropic.com

@qodo-for-redhat-appstudio

qodo-for-redhat-appstudio Bot commented May 29, 2026

Copy link
Copy Markdown

Review Summary by Qodo

(Agentic_describe updated until commit f49ddc7)

Refactor cleanup cronjob with pagination and memory optimization

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Replace AWK-based processing with kubectl pagination API for efficient batch handling
• Process resources in batches of 100 using --chunk-size parameter
• Stream JSON output through jq to minimize memory footprint
• Increase memory limit from 256Mi to 512Mi for batch processing
• Add deletion and error counters for improved observability
Diagram
flowchart LR
  A["kubectl get with --chunk-size=100"] --> B["JSON output to temp file"]
  B --> C["jq stream processing"]
  C --> D["Parse name:namespace:completionTime"]
  D --> E["Date conversion to epoch"]
  E --> F["kubectl delete with counters"]
  F --> G["Summary with DELETED_COUNT and ERROR_COUNT"]

Loading

Grey Divider

File Changes

1. components/internal-services/internal-staging/cronjob/cleanup-internal-requests-pipelineruns.yaml ✨ Enhancement +35/-26

Pagination-based cleanup with memory optimization

• Replace AWK-based date parsing with kubectl --chunk-size=100 pagination API
• Switch from template output to JSON format processed by jq for memory efficiency
• Implement streaming while loop with epoch date conversion for resource filtering
• Add DELETED_COUNT and ERROR_COUNT tracking with informational logging
• Increase memory limits from 256Mi to 512Mi for both requests and limits
• Improve error handling with stderr capture and cleanup of temporary files

components/internal-services/internal-staging/cronjob/cleanup-internal-requests-pipelineruns.yaml


Grey Divider

Qodo Logo

@qodo-for-redhat-appstudio

qodo-for-redhat-appstudio Bot commented May 29, 2026

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (6) 📘 Rule violations (0)

Grey Divider


Action required

1. Tempfiles on read-only FS 🐞 Bug ☼ Reliability ⭐ New
Description
The script now uses mktemp without targeting the writable /var/tmp mount while
readOnlyRootFilesystem: true, so temp file creation (and the kubectl get redirect) can fail and
break the cleanup job. This is a regression vs the internal-production manifest which explicitly
places temp files under /var/tmp.
Code

components/internal-services/internal-staging/cronjob/cleanup-internal-requests-pipelineruns.yaml[R45-46]

Relevance

⭐⭐⭐ High

Cleanup cronjob historically used mktemp -p /var/tmp with readOnlyRootFilesystem, implying
/var/tmp is required.

PR-#242
PR-#326

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
In internal-staging, temp files are created with plain mktemp, but the pod mounts only /var/tmp
as writable while the root filesystem is read-only. The internal-production manifest in the same
repo explicitly uses mktemp -p /var/tmp, indicating /var/tmp is the intended scratch location.

components/internal-services/internal-staging/cronjob/cleanup-internal-requests-pipelineruns.yaml[44-56]
components/internal-services/internal-staging/cronjob/cleanup-internal-requests-pipelineruns.yaml[83-99]
components/internal-services/internal-production/cronjob/cleanup-internal-requests-pipelineruns.yaml[38-40]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The cronjob creates temp files via `mktemp` with no directory, which defaults to `/tmp`. This CronJob runs with `readOnlyRootFilesystem: true` and only mounts an `emptyDir` at `/var/tmp`, so creating temp files in `/tmp` is likely to fail.

### Issue Context
The manifest already mounts `/var/tmp` specifically for scratch space, and the internal-production variant uses `mktemp -p /var/tmp`.

### Fix Focus Areas
- components/internal-services/internal-staging/cronjob/cleanup-internal-requests-pipelineruns.yaml[36-56]
- components/internal-services/internal-staging/cronjob/cleanup-internal-requests-pipelineruns.yaml[80-99]

### Suggested change
- Set `TMPDIR=/var/tmp` or update both tempfiles to `mktemp -p /var/tmp` (for `KUBECTL_OUTPUT` and `KUBECTL_ERR`).
- (Optional hardening) Add a `trap` to delete temp files on exit.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Stderr mixed into JSON 🐞 Bug ☼ Reliability
Description
kubectl get ... -o json redirects stderr into the same file as stdout (2>&1), so
warnings/deprecation messages can corrupt the JSON. This can make jq fail and/or cause the
pagination continue-token extraction to break, leading to incomplete cleanup without a clear
failure.
Code

components/internal-services/internal-production/cronjob/cleanup-internal-requests-pipelineruns.yaml[R57-84]

Relevance

⭐⭐⭐ High

Team repeatedly changed this cronjob for kubectl reliability; likely accept separating stderr to
keep JSON parseable.

PR-#242
PR-#244
PR-#246

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
Both manifests redirect kubectl stderr into the JSON output file and then immediately parse the
same file with jq for items and the continue token, so any stderr output will break JSON
parsing/pagination.

components/internal-services/internal-production/cronjob/cleanup-internal-requests-pipelineruns.yaml[56-66]
components/internal-services/internal-production/cronjob/cleanup-internal-requests-pipelineruns.yaml[82-84]
components/internal-services/internal-staging/cronjob/cleanup-internal-requests-pipelineruns.yaml[56-66]
components/internal-services/internal-staging/cronjob/cleanup-internal-requests-pipelineruns.yaml[82-84]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The script writes both stdout and stderr from `kubectl get ... -o json` into `$BATCH_OUTPUT`, then parses `$BATCH_OUTPUT` as JSON with `jq`. Any stderr output can corrupt the JSON and break processing/pagination.

### Issue Context
The file is later used for `.items[]` processing and `.metadata.continue` extraction.

### Fix
Write stdout (JSON) to `$BATCH_OUTPUT` and stderr to a separate file/stream. On failure, surface stderr in logs:

```bash
BATCH_OUTPUT=$(mktemp -p /var/tmp)
BATCH_ERR=$(mktemp -p /var/tmp)
if ! kubectl ... -o json >"$BATCH_OUTPUT" 2>"$BATCH_ERR"; then
 echo "ERROR: failed to list ${CR_TYPE} resources"
 cat "$BATCH_ERR" >&2
 rm -f "$BATCH_OUTPUT" "$BATCH_ERR"
 exit 1
fi
rm -f "$BATCH_ERR"
```

Also consider checking `jq` exit codes explicitly and failing the job if JSON parsing fails.

### Fix Focus Areas
- components/internal-services/internal-production/cronjob/cleanup-internal-requests-pipelineruns.yaml[56-85]
- components/internal-services/internal-staging/cronjob/cleanup-internal-requests-pipelineruns.yaml[56-85]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Parse failure triggers deletion 🐞 Bug ≡ Correctness
Description
If date -d fails to parse a completionTime, the code forces completion_epoch to 0, which makes
the "old enough" check pass and can delete resources despite an unknown completion time. This turns
parsing glitches/data issues into unintended deletions.
Code

components/internal-services/internal-production/cronjob/cleanup-internal-requests-pipelineruns.yaml[R68-72]

Relevance

⭐⭐⭐ High

Prior cronjob work emphasized safety; not deleting on parse failure matches “avoid accidental
deletion” direction.

PR-#242

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
Both manifests explicitly map date parsing failures to epoch 0 and then use that value for the
deletion decision, which can cause deletions when completionTime is not parseable.

components/internal-services/internal-production/cronjob/cleanup-internal-requests-pipelineruns.yaml[66-72]
components/internal-services/internal-staging/cronjob/cleanup-internal-requests-pipelineruns.yaml[66-72]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
On date parsing failure, `completion_epoch` becomes `0`, which will almost always satisfy `SECONDS_BACK > completion_epoch` and incorrectly trigger deletion.

### Issue Context
This is especially risky in a cleanup job because it can delete resources you cannot confidently classify as "older than".

### Fix
Treat parse failures as errors and skip (or fail fast). Example:

```bash
if ! completion_epoch=$(date -d "$completion_time" +%s 2>/dev/null); then
 echo "ERROR: Unparseable completionTime for ${cr_name} in ${cr_namespace}: ${completion_time}"
 ERROR_COUNT=$((ERROR_COUNT + 1))
 continue
fi
```

### Fix Focus Areas
- components/internal-services/internal-production/cronjob/cleanup-internal-requests-pipelineruns.yaml[66-79]
- components/internal-services/internal-staging/cronjob/cleanup-internal-requests-pipelineruns.yaml[66-79]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (1)
4. Eval command injection risk 🐞 Bug ⛨ Security
Description
The script constructs a shell command string from CR_TYPE/CR_NAMESPACE/LABELS/CONTINUE and executes
it with eval, so metacharacters in those values can execute arbitrary commands. This is an
avoidable injection risk in a privileged automation job.
Code

components/internal-services/internal-production/cronjob/cleanup-internal-requests-pipelineruns.yaml[R50-59]

Relevance

⭐⭐ Medium

No prior repo feedback on avoiding eval; some PRs flagged “Possible security concern” but not this
pattern.

PR-#384

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
Both cronjobs build CMD using unquoted variable expansions and execute it via eval, which will
interpret shell syntax present in those variable values.

components/internal-services/internal-production/cronjob/cleanup-internal-requests-pipelineruns.yaml[49-59]
components/internal-services/internal-staging/cronjob/cleanup-internal-requests-pipelineruns.yaml[49-59]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
A kubectl command is assembled into a string and executed via `eval`, which enables shell injection through variable contents.

### Issue Context
Variables are currently interpolated without robust quoting, and `eval` will interpret any metacharacters.

### Fix
Avoid `eval` entirely. Build arguments as a bash array and invoke `kubectl` directly:

```bash
kubectl_args=(get "$CR_TYPE" -n "$CR_NAMESPACE" -l "$LABELS" --limit="$BATCH_SIZE" -o json)
if [ -n "$CONTINUE" ]; then
 kubectl_args+=(--continue="$CONTINUE")
fi

if ! kubectl "${kubectl_args[@]}" >"$BATCH_OUTPUT" 2>"$BATCH_ERR"; then
 ...
fi
```

### Fix Focus Areas
- components/internal-services/internal-production/cronjob/cleanup-internal-requests-pipelineruns.yaml[49-62]
- components/internal-services/internal-staging/cronjob/cleanup-internal-requests-pipelineruns.yaml[49-62]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

5. jq errors not detected 🐞 Bug ☼ Reliability ⭐ New
Description
The cleanup loop consumes jq output via process substitution but never checks whether jq
succeeded, so missing jq / JSON parse failures can make the job report "Cleanup complete" with
zero deletions/errors while actually doing no work. This breaks the observability the PR is trying
to add and can leave resources unpruned without a failing exit code.
Code

components/internal-services/internal-staging/cronjob/cleanup-internal-requests-pipelineruns.yaml[R57-76]

Relevance

⭐⭐ Medium

Cleanup cronjob reliability fixes common, but no history about checking jq exit status in process
substitution loops.

PR-#242
PR-#246
PR-#244

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
The script feeds the loop from jq via process substitution and has no conditional/exit-path that
handles jq failure, yet it always prints a completion message afterward. This means jq can fail
without being reflected in the job result or counters.

components/internal-services/internal-staging/cronjob/cleanup-internal-requests-pipelineruns.yaml[57-79]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The script uses `done < <(jq ...)` but never verifies `jq` executed successfully. If `jq` is missing or the JSON is malformed, the `while` loop will read nothing and the script will still print the final success line.

### Issue Context
This cronjob is intended to be an automated cleanup mechanism; silently doing nothing defeats its purpose and makes operational debugging hard.

### Fix Focus Areas
- components/internal-services/internal-staging/cronjob/cleanup-internal-requests-pipelineruns.yaml[35-79]

### Suggested change
- Add an explicit dependency check early: `command -v jq >/dev/null || { echo "ERROR: jq not found"; exit 1; }`.
- Capture and validate jq success before entering the loop, e.g.:
 - `JQ_OUTPUT=$(mktemp -p /var/tmp)`
 - `if ! jq -r '...' "$KUBECTL_OUTPUT" > "$JQ_OUTPUT"; then echo "ERROR: jq failed"; ERROR_COUNT=$((ERROR_COUNT+1)); exit 1; fi`
 - `while ...; do ...; done < "$JQ_OUTPUT"`
 - cleanup temp files via `trap`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


6. Counters updated in subshell 🐞 Bug ≡ Correctness
Description
DELETED_COUNT and ERROR_COUNT are incremented inside a while loop that is fed by a pipe from jq,
which runs the loop in a bash subshell. The job will still delete resources, but the final "Cleanup
complete" log will always report 0 deletions/errors, breaking the observability this PR is adding.
Code

components/internal-services/internal-production/cronjob/cleanup-internal-requests-pipelineruns.yaml[R65-80]

Relevance

⭐⭐⭐ High

Cronjob hardening emphasized observability; fixing subshell counter bug aligns with prior
reliability-focused edits.

PR-#242

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
Both cronjobs pipe jq output into while ...; do ...; done and increment counters inside that
loop; in bash this loop runs in a subshell, so the final echo uses unchanged parent-shell variables.

components/internal-services/internal-production/cronjob/cleanup-internal-requests-pipelineruns.yaml[33-35]
components/internal-services/internal-production/cronjob/cleanup-internal-requests-pipelineruns.yaml[64-80]
components/internal-services/internal-production/cronjob/cleanup-internal-requests-pipelineruns.yaml[94-94]
components/internal-services/internal-staging/cronjob/cleanup-internal-requests-pipelineruns.yaml[64-80]
components/internal-services/internal-staging/cronjob/cleanup-internal-requests-pipelineruns.yaml[94-94]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The counters are incremented in a `while` loop that is part of a pipeline (`jq | while ...`), so the loop runs in a subshell and counter updates are lost.

### Issue Context
This manifests as `Deleted: 0, Errors: 0` even when deletions occurred.

### Fix
Use process substitution (or read from a temp file) so the `while` loop runs in the current shell:

```bash
while IFS=: read -r cr_name cr_namespace completion_time; do
 ...
done < <(jq -r '.items[] | select(.status.completionTime != null) | "\(.metadata.name):\(.metadata.namespace):\(.status.completionTime)"' "$BATCH_OUTPUT")
```

### Fix Focus Areas
- components/internal-services/internal-production/cronjob/cleanup-internal-requests-pipelineruns.yaml[64-80]
- components/internal-services/internal-staging/cronjob/cleanup-internal-requests-pipelineruns.yaml[64-80]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Previous review results

Review updated until commit f49ddc7

Results up to commit 1742ffb


🐞 Bugs (4) 📘 Rule violations (0) 📎 Requirement gaps (0)


Action required
1. Stderr mixed into JSON 🐞 Bug ☼ Reliability
Description
kubectl get ... -o json redirects stderr into the same file as stdout (2>&1), so
warnings/deprecation messages can corrupt the JSON. This can make jq fail and/or cause the
pagination continue-token extraction to break, leading to incomplete cleanup without a clear
failure.
Code

components/internal-services/internal-production/cronjob/cleanup-internal-requests-pipelineruns.yaml[R57-84]

Relevance

⭐⭐⭐ High

Team repeatedly changed this cronjob for kubectl reliability; likely accept separating stderr to
keep JSON parseable.

PR-#242
PR-#244
PR-#246

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
Both manifests redirect kubectl stderr into the JSON output file and then immediately parse the
same file with jq for items and the continue token, so any stderr output will break JSON
parsing/pagination.

components/internal-services/internal-production/cronjob/cleanup-internal-requests-pipelineruns.yaml[56-66]
components/internal-services/internal-production/cronjob/cleanup-internal-requests-pipelineruns.yaml[82-84]
components/internal-services/internal-staging/cronjob/cleanup-internal-requests-pipelineruns.yaml[56-66]
components/internal-services/internal-staging/cronjob/cleanup-internal-requests-pipelineruns.yaml[82-84]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The script writes both stdout and stderr from `kubectl get ... -o json` into `$BATCH_OUTPUT`, then parses `$BATCH_OUTPUT` as JSON with `jq`. Any stderr output can corrupt the JSON and break processing/pagination.

### Issue Context
The file is later used for `.items[]` processing and `.metadata.continue` extraction.

### Fix
Write stdout (JSON) to `$BATCH_OUTPUT` and stderr to a separate file/stream. On failure, surface stderr in logs:

```bash
BATCH_OUTPUT=$(mktemp -p /var/tmp)
BATCH_ERR=$(mktemp -p /var/tmp)
if ! kubectl ... -o json >"$BATCH_OUTPUT" 2>"$BATCH_ERR"; then
 echo "ERROR: failed to list ${CR_TYPE} resources"
 cat "$BATCH_ERR" >&2
 rm -f "$BATCH_OUTPUT" "$BATCH_ERR"
 exit 1
fi
rm -f "$BATCH_ERR"
```

Also consider checking `jq` exit codes explicitly and failing the job if JSON parsing fails.

### Fix Focus Areas
- components/internal-services/internal-production/cronjob/cleanup-internal-requests-pipelineruns.yaml[56-85]
- components/internal-services/internal-staging/cronjob/cleanup-internal-requests-pipelineruns.yaml[56-85]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Parse failure triggers deletion 🐞 Bug ≡ Correctness
Description
If date -d fails to parse a completionTime, the code forces completion_epoch to 0, which makes
the "old enough" check pass and can delete resources despite an unknown completion time. This turns
parsing glitches/data issues into unintended deletions.
Code

components/internal-services/internal-production/cronjob/cleanup-internal-requests-pipelineruns.yaml[R68-72]

Relevance

⭐⭐⭐ High

Prior cronjob work emphasized safety; not deleting on parse failure matches “avoid accidental
deletion” direction.

PR-#242

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
Both manifests explicitly map date parsing failures to epoch 0 and then use that value for the
deletion decision, which can cause deletions when completionTime is not parseable.

components/internal-services/internal-production/cronjob/cleanup-internal-requests-pipelineruns.yaml[66-72]
components/internal-services/internal-staging/cronjob/cleanup-internal-requests-pipelineruns.yaml[66-72]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
On date parsing failure, `completion_epoch` becomes `0`, which will almost always satisfy `SECONDS_BACK > completion_epoch` and incorrectly trigger deletion.

### Issue Context
This is especially risky in a cleanup job because it can delete resources you cannot confidently classify as "older than".

### Fix
Treat parse failures as errors and skip (or fail fast). Example:

```bash
if ! completion_epoch=$(date -d "$completion_time" +%s 2>/dev/null); then
 echo "ERROR: Unparseable completionTime for ${cr_name} in ${cr_namespace}: ${completion_time}"
 ERROR_COUNT=$((ERROR_COUNT + 1))
 continue
fi
```

### Fix Focus Areas
- components/internal-services/internal-production/cronjob/cleanup-internal-requests-pipelineruns.yaml[66-79]
- components/internal-services/internal-staging/cronjob/cleanup-internal-requests-pipelineruns.yaml[66-79]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Eval command injection risk 🐞 Bug ⛨ Security
Description
The script constructs a shell command string from CR_TYPE/CR_NAMESPACE/LABELS/CONTINUE and executes
it with eval, so metacharacters in those values can execute arbitrary commands. This is an
avoidable injection risk in a privileged automation job.
Code

components/internal-services/internal-production/cronjob/cleanup-internal-requests-pipelineruns.yaml[R50-59]

Relevance

⭐⭐ Medium

No prior repo feedback on avoiding eval; some PRs flagged “Possible security concern” but not this
pattern.

PR-#384

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
Both cronjobs build CMD using unquoted variable expansions and execute it via eval, which will
interpret shell syntax present in those variable values.

components/internal-services/internal-production/cronjob/cleanup-internal-requests-pipelineruns.yaml[49-59]
components/internal-services/internal-staging/cronjob/cleanup-internal-requests-pipelineruns.yaml[49-59]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
A kubectl command is assembled into a string and executed via `eval`, which enables shell injection through variable contents.

### Issue Context
Variables are currently interpolated without robust quoting, and `eval` will interpret any metacharacters.

### Fix
Avoid `eval` entirely. Build arguments as a bash array and invoke `kubectl` directly:

```bash
kubectl_args=(get "$CR_TYPE" -n "$CR_NAMESPACE" -l "$LABELS" --limit="$BATCH_SIZE" -o json)
if [ -n "$CONTINUE" ]; then
 kubectl_args+=(--continue="$CONTINUE")
fi

if ! kubectl "${kubectl_args[@]}" >"$BATCH_OUTPUT" 2>"$BATCH_ERR"; then
 ...
fi
```

### Fix Focus Areas
- components/internal-services/internal-production/cronjob/cleanup-internal-requests-pipelineruns.yaml[49-62]
- components/internal-services/internal-staging/cronjob/cleanup-internal-requests-pipelineruns.yaml[49-62]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended
4. Counters updated in subshell 🐞 Bug ≡ Correctness
Description
DELETED_COUNT and ERROR_COUNT are incremented inside a while loop that is fed by a pipe from jq,
which runs the loop in a bash subshell. The job will still delete resources, but the final "Cleanup
complete" log will always report 0 deletions/errors, breaking the observability this PR is adding.
Code

components/internal-services/internal-production/cronjob/cleanup-internal-requests-pipelineruns.yaml[R65-80]

Relevance

⭐⭐⭐ High

Cronjob hardening emphasized observability; fixing subshell counter bug aligns with prior
reliability-focused edits.

PR-#242

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
Both cronjobs pipe jq output into while ...; do ...; done and increment counters inside that
loop; in bash this loop runs in a subshell, so the final echo uses unchanged parent-shell variables.

components/internal-services/internal-production/cronjob/cleanup-internal-requests-pipelineruns.yaml[33-35]
components/internal-services/internal-production/cronjob/cleanup-internal-requests-pipelineruns.yaml[64-80]
components/internal-services/internal-production/cronjob/cleanup-internal-requests-pipelineruns.yaml[94-94]
components/internal-services/internal-staging/cronjob/cleanup-internal-requests-pipelineruns.yaml[64-80]
components/internal-services/internal-staging/cronjob/cleanup-internal-requests-pipelineruns.yaml[94-94]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The counters are incremented in a `while` loop that is part of a pipeline (`jq | while ...`), so the loop runs in a subshell and counter updates are lost.

### Issue Context
This manifests as `Deleted: 0, Errors: 0` even when deletions occurred.

### Fix
Use process substitution (or read from a temp file) so the `while` loop runs in the current shell:

```bash
while IFS=: read -r cr_name cr_namespace completion_time; do
 ...
done < <(jq -r '.items[] | select(.status.completionTime != null) | "\(.metadata.name):\(.metadata.namespace):\(.status.completionTime)"' "$BATCH_OUTPUT")
```

### Fix Focus Areas
- components/internal-services/internal-production/cronjob/cleanup-internal-requests-pipelineruns.yaml[64-80]
- components/internal-services/internal-staging/cronjob/cleanup-internal-requests-pipelineruns.yaml[64-80]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Qodo Logo

@mmalina

mmalina commented May 29, 2026

Copy link
Copy Markdown
Contributor

Where is your Assisted-by line? ;)

CONTINUE=""
while true; do
# Build kubectl command with limit and continue token
CMD="kubectl get ${CR_TYPE} -n ${CR_NAMESPACE} -l ${LABELS} --limit=${BATCH_SIZE}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubectl get does not have a --limit flag unfortunately, it has a --chunk-size which still returns all items, but in chunks, which I don't think is what we want here.

Perhaps we can use something like:

kubectl get --raw "/api/v1/{cr_type}?limit=${BATCH_SIZE}"

and parse the raw response, to fetch the continue token and do the whole pagination manually.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to chunk-size.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using chunk-size would still keep all items in memory in the $output var, so we would hit the same issue here I think.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we would hit the same issue here

I guess that depends on what was eating the resources, right? the only improvement with --chunk-size is that kubectl itself should perform better, but the rest will be exactly the same.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how chunk size works, if it loads everything and just display in chunks, probably the issue will continue. If this does not solve, we could try adding the completionTIme as a customfiled selector to the CR` so we might be able to "pre-filter" the internalrequests: and not need the mechanics we are doing now:
Reference: https://kubernetes.io/docs/concepts/overview/working-with-objects/field-selectors/#custom-resources-fields

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field selector supports ==, != so "maybe" we might be able to filter with the desired date (I did not test, though).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seem chunk-size will solve it:

The Mechanism

Instead of making one massive, stressful API call that requests all 100,000 items at once (which could time out or crash the API server), kubectl breaks the request into 100 smaller sequential batches.

kubectl asks the API server for the first 1,000 items.

The server sends them back along with a continue token.

kubectl prints those 1,000 items to your terminal, then immediately uses the token to ask for the next 1,000.

This loops automatically until all 100,000 items are printed.

The Bottom Line: --chunk-size is a tool for performance and reliability, not for truncating your terminal output.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, so this confirms what I said earlier:

I guess that depends on what was eating the resources, right? the only improvement with --chunk-size is that kubectl itself should perform better, but the rest will be exactly the same.

CONTINUE=""
while true; do
# Build kubectl command with limit and continue token
CMD="kubectl get ${CR_TYPE} -n ${CR_NAMESPACE} -l ${LABELS} --limit=${BATCH_SIZE}"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--limit seems to be an hallucination

➜  kubectl version
Client Version: v1.35.5
Kustomize Version: v5.7.1
Server Version: v1.35.0

➜  kubectl get pods --limit=4
error: unknown flag: --limit
See 'kubectl get --help' for usage.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally! Moved to chunk-size.

BATCH_OUTPUT=$(mktemp -p /var/tmp)
if ! eval "$CMD" > "$BATCH_OUTPUT" 2>&1; then
echo "ERROR: failed to list ${CR_TYPE} resources"
rm -f "$BATCH_OUTPUT"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why bother removing the file? it's a container.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code changed but I'd say nothing wrong removing the file here. Happy to remove the cleanup lines if you prefer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say it's just redundant - we just never do that. but if you prefer to keep it, fine with me.


# Check for continuation token
CONTINUE=$(jq -r '.metadata.continue // empty' "$BATCH_OUTPUT")
rm -f "$BATCH_OUTPUT"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why bother removing the file? it's a container.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code changed but I'd say nothing wrong removing the file here. Happy to remove the cleanup lines if you prefer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say it's just redundant - we just never do that. but if you prefer to keep it, fine with me.

fi

# Process each item in the batch individually to minimize memory
jq -r '.items[] | select(.status.completionTime != null) | "\(.metadata.name):\(.metadata.namespace):\(.status.completionTime)"' "$BATCH_OUTPUT" | \

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jq | while runs the while loop. DELETED_COUNT and ERROR_COUNT will always be 0 in the final log, regardless of actual deletions. Since it runs in a subshell.
Guess something like:

while IFS=: read -r cr_name cr_namespace completion_time; do
done < <(jq -r <> "$BATCH_OUTPUT")

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed.

echo "ERROR: namespace=${namespace} ${output}"
# Check if old enough to delete
if [ "$SECONDS_BACK" -gt "$completion_epoch" ]; then
if kubectl delete "${CR_TYPE}" "${cr_name}" -n "${cr_namespace}" --wait=false --timeout=30s 2>&1; then

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old code outputs the delete log. Now we don't show the output, we won't see the reason it failed.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it's displayed again.

@davidmogar davidmogar marked this pull request as draft May 29, 2026 13:38
@davidmogar

Copy link
Copy Markdown
Member Author

Where is your Assisted-by line? ;)

The PR can be garbage and I still need to work on it and test it, but the commit contains Generated-By ;)

@davidmogar davidmogar marked this pull request as ready for review June 1, 2026 07:24
@openshift-ci openshift-ci Bot requested review from FilipNikolovski and mmalina June 1, 2026 07:24
@qodo-for-redhat-appstudio

qodo-for-redhat-appstudio Bot commented Jun 1, 2026

Copy link
Copy Markdown

Code review by qodo was updated up to the latest commit f49ddc7

@davidmogar

Copy link
Copy Markdown
Member Author

Sean kindly tested this PR (thanks Sean) and got a successful run:

╰─❯ bash /tmp/script.sh
INFO: Starting cleanup for pipelineruns in default older than now
INFO: namespace=default [pipelinerun.tekton.dev](http://pipelinerun.tekton.dev/) "dummy-cleanup-test-45dj7" deleted from default namespace
INFO: namespace=default [pipelinerun.tekton.dev](http://pipelinerun.tekton.dev/) "dummy-cleanup-test-4kjr6" deleted from default namespace
INFO: namespace=default [pipelinerun.tekton.dev](http://pipelinerun.tekton.dev/) "dummy-cleanup-test-6m9r5" deleted from default namespace
INFO: namespace=default [pipelinerun.tekton.dev](http://pipelinerun.tekton.dev/) "dummy-cleanup-test-b8bhw" deleted from default namespace
INFO: namespace=default [pipelinerun.tekton.dev](http://pipelinerun.tekton.dev/) "dummy-cleanup-test-d2hch" deleted from default namespace
INFO: namespace=default [pipelinerun.tekton.dev](http://pipelinerun.tekton.dev/) "dummy-cleanup-test-djtsg" deleted from default namespace
INFO: namespace=default [pipelinerun.tekton.dev](http://pipelinerun.tekton.dev/) "dummy-cleanup-test-vjpjj" deleted from default namespace
INFO: namespace=default [pipelinerun.tekton.dev](http://pipelinerun.tekton.dev/) "dummy-cleanup-test-wgh2s" deleted from default namespace
INFO: namespace=default [pipelinerun.tekton.dev](http://pipelinerun.tekton.dev/) "dummy-cleanup-test-whb5g" deleted from default namespace
INFO: namespace=default [pipelinerun.tekton.dev](http://pipelinerun.tekton.dev/) "dummy-cleanup-test-wlnm5" deleted from default namespace
INFO: Cleanup complete. Deleted: 10, Errors: 0

Comment on lines +45 to +46
KUBECTL_OUTPUT=$(mktemp)
KUBECTL_ERR=$(mktemp)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Tempfiles on read-only fs 🐞 Bug ☼ Reliability

The script now uses mktemp without targeting the writable /var/tmp mount while
readOnlyRootFilesystem: true, so temp file creation (and the kubectl get redirect) can fail and
break the cleanup job. This is a regression vs the internal-production manifest which explicitly
places temp files under /var/tmp.
Agent Prompt
### Issue description
The cronjob creates temp files via `mktemp` with no directory, which defaults to `/tmp`. This CronJob runs with `readOnlyRootFilesystem: true` and only mounts an `emptyDir` at `/var/tmp`, so creating temp files in `/tmp` is likely to fail.

### Issue Context
The manifest already mounts `/var/tmp` specifically for scratch space, and the internal-production variant uses `mktemp -p /var/tmp`.

### Fix Focus Areas
- components/internal-services/internal-staging/cronjob/cleanup-internal-requests-pipelineruns.yaml[36-56]
- components/internal-services/internal-staging/cronjob/cleanup-internal-requests-pipelineruns.yaml[80-99]

### Suggested change
- Set `TMPDIR=/var/tmp` or update both tempfiles to `mktemp -p /var/tmp` (for `KUBECTL_OUTPUT` and `KUBECTL_ERR`).
- (Optional hardening) Add a `trap` to delete temp files on exit.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

@seanconroy2021

Copy link
Copy Markdown
Member

lgtm :)

@FilipNikolovski

Copy link
Copy Markdown
Contributor

lgtm

@mmalina

mmalina commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Where is your Assisted-by line? ;)

The PR can be garbage and I still need to work on it and test it, but the commit contains Generated-By ;)

I know, but I thought we were specifically required to use Assisted-by:

@openshift-ci

openshift-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: davidmogar, theflockers

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Replace AWK-based processing with kubectl pagination API to handle large
resource sets more efficiently. Process resources in batches of 100 using
continuation tokens, and stream through jq to minimize memory footprint.
Increase memory limit from 256Mi to 512Mi to accommodate batch processing.
Add deletion and error counters for better observability.

Generated-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Signed-off-by: David Moreno García <damoreno@redhat.com>
@openshift-ci

openshift-ci Bot commented Jun 30, 2026

Copy link
Copy Markdown

New changes are detected. LGTM label has been removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants